NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Joint mirror procedure: controlling false discovery rate for identifying simultaneous signals

https://doi.org/10.1093/biomtc/ujae142

Deng, Linsui; He, Kejun; Zhang, Xianyang (October 2024, Biometrics)

ABSTRACT In many applications, the process of identifying a specific feature of interest often involves testing multiple hypotheses for their joint statistical significance. Examples include mediation analysis, which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis, aiming to identify simultaneous signals that exhibit statistical significance across multiple independent studies. In this work, we present a new approach called the joint mirror (JM) procedure that effectively detects such features while maintaining false discovery rate (FDR) control in finite samples. The JM procedure employs an iterative method that gradually shrinks the rejection region based on progressively revealed information until a conservative estimate of the false discovery proportion is below the target FDR level. Additionally, we introduce a more stringent error measure known as the composite FDR (cFDR), which assigns weights to each false discovery based on its number of null components. We use the leave-one-out technique to prove that the JM procedure controls the cFDR in finite samples. To implement the JM procedure, we propose an efficient algorithm that can incorporate partial ordering information. Through extensive simulations, we show that our procedure effectively controls the cFDR and enhances statistical power across various scenarios, including the case that test statistics are dependent across the features. Finally, we showcase the utility of our method by applying it to real-world mediation and replicability analyses.
more » « less
Full Text Available
Multi-way overlapping clustering by Bayesian tensor decomposition

https://doi.org/10.4310/23-SII790

Wang, Zhuofan; Zhou, Fangting; He, Kejun; Ni, Yang (January 2024, Statistics and Its Interface)

Full Text Available
Bayesian Nonlinear Tensor Regression with Functional Fused Elastic Net Prior

https://doi.org/10.1080/00401706.2023.2197471

Chen, Shuoli; He, Kejun; He, Shiyuan; Ni, Yang; Wong, Raymond_K W (October 2023, Technometrics)

Full Text Available
Functional Bayesian networks for discovering causality from multivariate functional data

https://doi.org/10.1111/biom.13922

Zhou, Fangting; He, Kejun; Wang, Kunbo; Xu, Yanxun; Ni, Yang (August 2023, Biometrics)

Abstract Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest. In this paper, we develop a novel Bayesian network (BN) model for multivariate functional data where conditional independencies and causal structure are encoded by a directed acyclic graph. Specifically, we allow the functional objects to deviate from Gaussian processes, which is the key to unique causal structure identification even when the functions are measured with noises. A fully Bayesian framework is designed to infer the functional BN model with natural uncertainty quantification through posterior summaries. Simulation studies and real data examples demonstrate the practical utility of the proposed model.
more » « less
Full Text Available
Individualized Causal Discovery with Latent Trajectory Embedded Bayesian Networks

https://doi.org/10.1111/biom.13843

Zhou, Fangting; He, Kejun; Ni, Yang (February 2023, Biometrics)

Abstract Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data.
more » « less
Full Text Available
LinDA: linear models for differential abundance analysis of microbiome compositional data

https://doi.org/10.1186/s13059-022-02655-5

Zhou, Huijuan; He, Kejun; Chen, Jun; Zhang, Xianyang (December 2022, Genome Biology)

Abstract Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.
more » « less
Full Text Available
Causal Discovery with Heterogeneous Observational Data

Zhou, Fangting; He, Kejun; Ni, Yang (January 2022, Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence)

Full Text Available
Tensor Linear Regression: Degeneracy and Solution

https://doi.org/10.1109/ACCESS.2021.3049494

Zhou, Ya; Wong, Raymond K.; He, Kejun (January 2021, IEEE Access)
null (Ed.)
Full Text Available
Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization

https://doi.org/10.1093/biostatistics/kxab002

Zhou, Fangting; He, Kejun; Li, Qiwei; Chapkin, Robert S; Ni, Yang (February 2021, Biostatistics)
null (Ed.)
Summary High-throughput sequencing technology provides unprecedented opportunities to quantitatively explore human gut microbiome and its relation to diseases. Microbiome data are compositional, sparse, noisy, and heterogeneous, which pose serious challenges for statistical modeling. We propose an identifiable Bayesian multinomial matrix factorization model to infer overlapping clusters on both microbes and hosts. The proposed method represents the observed over-dispersed zero-inflated count matrix as Dirichlet-multinomial mixtures on which latent cluster structures are built hierarchically. Under the Bayesian framework, the number of clusters is automatically determined and available information from a taxonomic rank tree of microbes is naturally incorporated, which greatly improves the interpretability of our findings. We demonstrate the utility of the proposed approach by comparing to alternative methods in simulations. An application to a human gut microbiome data set involving patients with inflammatory bowel disease reveals interesting clusters, which contain bacteria families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Fusobacteriaceae, Lachnospiraceae, Ruminococcaceae, Pasteurellaceae, and Porphyromonadaceae that are known to be related to the inflammatory bowel disease and its subtypes according to biological literature. Our findings can help generate potential hypotheses for future investigation of the heterogeneity of the human gut microbiome.
more » « less
Full Text Available
Dimensionality Reduction and Variable Selection in Multivariate Varying-Coefficient Models With a Large Number of Covariates

https://doi.org/10.1080/01621459.2017.1285774

He, Kejun; Lian, Heng; Ma, Shujie; Huang, Jianhua Z. (February 2017, Journal of the American Statistical Association)

Full Text Available

Search for: All records